The Initial Study of Term Vector Generation Methods for News Summarization
نویسنده
چکیده
In this paper, I present initial study of new term vector generation methods. The Random Manhattan Indexing and the Skip-gram model were introduced as novel techniques of term vector generation with interesting features. The purpose of this study is to determine whether the methods are suitable for the Summec: A Summarization Engine for Czech. The Summec already use Heuristic, TF-IDF and Latent Semantic Analysis methods for news article summarization. I test quality of generated vectors on the Summec’s evaluation set and compare them with existing summarization methods. The novel summarization methods perform by 2 % worse than the LSA method. The evaluation set contains 50 newspaper articles, each annotated by 15 persons. The ROUGE toolkit is used to compare generated summaries with the human references. The above-mentioned evaluation set and the Summec demo are available online at http://nlp.ite.tul.cz/sumarizace.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملHeadliner: An integrated headline suggestion system
Headline generation is a short-form variant of document summarization that has been studied in natural language processing. This paper presents a case study examining the application of several different headline generation models at The Washington Post. Currently for individual news articles, multiple different headlines are manually written in order to target different platforms such as the w...
متن کاملWhat is in the news on a subject: automatic and sparse summarization of large document corpora
News media play a significant role in our political and daily lives. The traditional approach in media analysis to news summarization is labor intensive. As the amount of news data grows rapidly, the need is acute for automatic and scalable methods to aid media analysis researchers so that they could screen corpora of news articles very quickly before detailed reading. In this paper we propose ...
متن کاملDifferent Methods of Long-Term Electric Load Demand Forecasting a Comprehensive Review
Long-term demand forecasting presents the first step in planning and developing future generation, transmission and distribution facilities. One of the primary tasks of an electric utility accurately predicts load demand requirements at all times, especially for long-term. Based on the outcome of such forecasts, utilities coordinate their resources to meet the forecasted demand using a least-co...
متن کاملTowards Improving Abstractive Summarization via Entailment Generation
Abstractive summarization, the task of rewriting and compressing a document into a short summary, has achieved considerable success with neural sequence-tosequence models. However, these models can still benefit from stronger natural language inference skills, since a correct summary is logically entailed by the input document, i.e., it should not contain any contradictory or unrelated informat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015